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Abstract. The analysis of microwave observations over land to determine atmo- 
spheric and surface parameters is still limited due to the complexity of the inverse 
problem. Neural network techniques have already proved successful as the basis 
of efficient retrieval methods for non-linear cases, however, first-guess estimates, 
which are used in variational methods to avoid problems of solution non-uniqueness 
or other forms of solution irregularity, have up to now not been used with neural 
network methods. In this study, a neural network approach is developed that uses 
a first-guess. Conceptual bridges are established between the neural network and 
variational methods. The new neural method retrieves the surface skin temperature, 
the integrated water vapor content, the cloud liquid water path and the microwave 
surface emissivities between 19 and 85 GHz over land from SSM/I observations. The 
retrieval, in parallel, of all these quantities improves the results for consistancy rea- 
sons. A data base to train the neural network is calculated with a radiative transfer 
model and a a global collection of coincident surface and atmospheric parameters 
extracted from the National Center for Environmental Prediction reanalysis, from 
the International Satellite Cloud Climatology Project data and from microwave 
emissivity atlases previously calculated. The results of the neural network inversion 
are very encouraging. The r.m.s. error of the surface temperature retrieval over 
the globe is 1.3 K in clear sky conditions and 1.6 K in cloudy scenes. Water vapor 
is retrieved with a r.m.s. error of 3.8 kg/m 1 2 in clear conditions and 4.9 kg/m 2 in 
cloudy situations. The r.m.s. error in cloud liquid water path is 0.08 kg/m 2 . The 
surface emissivities are retrieved with an accuracy of better than 0.008 in clear 
conditions and 0.010 in cloudy conditions. Microwave land surface temperature 
retrieval presents a very attractive complement to the infra-red estimates in cloudy 
areas: time record of land surface temperature will be produced. 


1. Introduction 

Even after 20 years of global microwave satellite ob- 
servations, the use of microwave data over land for the 
retrieval of atmospheric and surface parameters is still 
very limited. While the ocean surface has a low mi- 
crowave emissivity —0.5 that produces good contrast 
of atmospheric phenomena against a low brightness 



very sensitive to the presence of thin clouds. The sen- 
sitivity of SSM/I to water vapor is very low, except 
in the most arid areas; so the results do not improve 
on the first-guess values. With an estimated accuracy 
of -0.1 kg/nr, the SSM/I retrieval does not properly 
characterize the thinner clouds (the majority) but the 
cloud structures with higher liquid water content are 
well delineated. 

A further improvement in this variational inversion 
scheme could be obtained by also retrieving the seven 
surface emissivities as they undergo small day-to-day 
changes induced by variations of the soil moisture, the 
vegetation density, or the snow cover. However, in this 
case, ten variables would have to be retrieved (Ts, WV, 
LWP plus the seven emissivities Em u where i repre- 
sents the 7 channels of SSM/I: 19GHz V, 19 GHz H, 
22 GHz V, 37 GHz V, 37 GHz H, 85 GHz V and 85 
GHz H; V for vertical polarization and H for horizontal 
polarization) from the seven SSM/I brightness temper- 
atures and additional information would be needed to 
solve the problem. The monthly-mean emissivity values 
previously computed could be used as first-guess (or, us- 
ing more specifically the variational assimilation formal- 
ism, the background) estimates of the surface emissivity 
and the first-guess matrix of error covariances could be 
calculated. There are several options: The covariance 
matrix could be calculated globally for a given month, 
estimated for a given type of surface, or even calcu- 
lated for each single pixel considering all the monthly 
mean emissivities for this pixel. The inversion scheme 
would then rely very heavily on the representativity of 
such covariance matrices, giving an important weight 
to the statistical description of the emissivity relation- 
ships. Given this difficulty with the retrieval of the 
surface emissivities with a variational method, another 
inversion approach is considered. 

Neural network techniques have already proved very 
successful in the development of computationally effi- 
cient inversion methods for satellite data and for geo- 
physical applications [Escobar et al , 1993; Aires et al , 
1998; Chevallier et al . , 2000]. They are w r ell adapted to 
solve non-linear problems and are especially designed 
to capitalize on the inherent statistical relationships 
among the retrieved parameters. Note that variational 
techniques, as usually implemented, do not account for 
correlations among the retrieved parameters. However, 
for many ill-conditioned problems, the use of a first- 
guess estimate is very important to regularize the in- 
version process and the first-guess error covariance ma- 
trix is also essential in 3D/4D variational assimilation 
schemes since it controls the impact of the observations 



This distance is dependent on the a priori information 
available on the probability distribution functions of the 
variables involved. If the observations y are assumed to 
be Gaussian distributed with zero-mean and without 
other a priori information, the Mahaianobis distance 
[Crone and Crosby, 1995] is optimal 

l[y(i) - y°}‘ <yy' > _1 (y(*) -*■ 0) 

where < y • y f > is the covariance matrix of the ob- 
servable quantities without measurement noise, y. This 
procedure has to be applied to find an optimum so- 
lution for each observation separately and can require 
significant computational resources. 

The second approach consists in estimating a general 
mapping fi-nction g\v, with parameters W , that is a 
global mod ?1 for y ~ A The parameters IV are the results 
of the mini nization of a cost function 

I D(x,x)P{x.r]), (4) 

where x -- jw{y°) = gw{y{ x ) + an< ^ ^ 
probability distribution of the physical variables x and 
the noise r). The distance D{x,x) is integrated over the 
physical states and over the observation noise, so that 
the model j\y is optimized globally over the range of 
x and the noise. In practice, to minimize the previous 
criterion, a data base is created, composed of a sta- 
tistically representative sample of coincident variables 
x and observations y° and the estimation of the pa- 
rameters IF is made once ana for all using this dataset. 
These schemes are called “global ! inversions. After this 
preliminarv step for the estimation of II , the inversion 
of an observation is very fast since it involves only the 
direct use of the model g\y. 

The distances used for localized and global inversion 
schemes involves different variables. The first one works 
on the brightness temperature space, the second one on 
physical variable space. The optimum solution in (4) 
gives an estimation x that is close to the true solution 
x while the distance in (2) specifies that the brightness 
temperatures y(x), associated with the estimated solu- 
tion x, are close to the brightness temperatures y{x) 
associated to the real solution x. 

Inverse problems are often ill-posed since the exis- 
tence and the uniqueness or the stability of the solution 
is not always known [Vapnik, 1997]. This is especially 
the case when the “forward** model, y(x), is not lin- 
ear; in our case the radiative transfer is not linear. To 
regularize the inversion process, all a priori informa- 
tion available should be used to constrain the solution, 


This equation is the only computation required in the 
operational mode (once the synaptic weights have been 
determined by the training procedure). A bias term for 
each neuron has been deliberately omitted to simplify 
the notation, even if it is used in the neural network. 

It has been demonstrated [ Hormk et ah, 1989; Cy- 
benko . 1989] that any continuous function can be repre- 
sented by a one-hidden-laver MLP with ^igmoide func- 
tions a. 

2.2.2. Optimization Algorithm: The Back- 

Propagation of Errors. Given a neural architecture 
(functions used as transfer functions cr, number of lay- 
ers, neurons and connections), all t le information of the 
network is contained in the set of all synaptic weights 
IT = {ii’ u }. The learning algorithm is an optimization 
technique that estimates the optimal network parame- 
ters IT by minimizing a cost function Ci(IT), approach- 
ing as closely as possible the desired function. The cri- 
terion usually used to derive W is the mean square error 
in network outputs 
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D E (x k (Y;W),x k ) 2 P{Y,x k )dx k dY 


( 8 ) 

where De is the Euclidean distance between xjt, the 
fcth desired output component, and i*, the kth neural 
network output component and So is the output layer 
of the neural network. Other contrast measures can 
be used if a priori information is available. P(T, z*) is 
the joint probability distribution function of Y and x 
This criterion is just the integrated distance between x 
and x introduced in (4). 

In practice, the probability distribution function, P(xk, Y), 
is sampled in a dataset B = {(Y' e , Xk e ), e = 
of A' input/output couples, and Ci(ir) is then approx- 
imated by the classical least square criterion: 


Ci(W) = D E (x k (Y e -,W),x> c e ) 2 ( 9 ) 

e=l A:£52 

The Error Back-Propagation algorithm [. Rumelhart 
et a/., 1986] is used to minimize Ci(VT). It is a gra- 
dient descent algorithm that is very well adapted to 
the MLP hierarchical architecture because the compu- 
tational cost is linearly related to the number of pa- 
rameters. Traditional gradient descent algorithms use 
all the samples of the dataset B to compute a mean Ja- 
cobian of the criterion C\{W) in equation (9). These al- 
gorithms are called deterministic gradient descent. The 
major inconvenience of this approach is that the de- 
scent can be trapped in local minima. In the present 
application, a stochastic gradient descent algorithm is 



radiative transfer equation globally, once and for all and 
uses the distribution P(x) for this purpose. This model 
is then valid for all observations (i.e. global inversion). 

To minimize this criterion, the neural network method 
creates a dataset 5 = {(x e , y oe , x be ): e = 1 , . . . , A r } that 
samples as well as possible all these probability distribu- 
tion functions. Then, the practical criterion used during 
the learning stage is given by: 

C 2 (IT) D E {gn(x b \y 0 '),x')) 2 . (13) 

2A :=i 

To create the dataset B, we sample the probab.i- 
ity distribution function P{x) by selecting geophysi- 
cal states (x e ) that cover all natural combinations and 
their correlations and by calculating y e = y{x e ) with 
the physical model (the radiative transfer model in this 
case). Alternatively we could obtain these relation- 
ships from a “sufficiently large” set of colocated and 
coincident values of y and x. For P n we need a pri- 
ori information about the measurement noise charac- 
teristics; a physical noise model could be used, but if 
all we have is an estimation of the noise magnitude, 
then we have to assume Gaussian distributed noise rj 
that is not correlated among the measurements. For 
P(x 6 |x) there are two situations. If a first-guess dataset 
{x b ‘: e - 1 ..... .V } exists, then x 6<! can be used di- 

rectly. If such a dataset is not available, we have to 
determine P(s) (as it is done in variational assimilation 
technique), the distribution of errors in the first-guess, 
e = x b - x, and use x 6 = x + e as input to the network. 
The balance between reliance on the first-guess and the 
direct measurements is then made automatically and 
optimally by the neural network during the training. 

Table 1 summarizes the specific features of the neu- 
ral network scheme with first-guess and the variational 
assimilation inversion technique. 

3. Generation of a Data Base to Train 
the Neural Network 

Two neural networks are trained: One for clear scenes 
(XXI), one for cloudy scenes (XX2). They both retrieve 
simultaneously the surface temperature Ts, the inte- 
grated water vapor content VTV', and the seven SSM/I 
surface emissivities Em, s. In addition to these pa- 
rameters. XX2 retrieves the cloud liquid water path 
LWP. Two sources of information are used for this 
purpose: (1) seven SSM/I brightness temperatures (ob- 
servations), and (2) a priori information of the state 
of the surface and atmospheric variables from ancillary 
datasets. In this study the experimental configuration is 
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information in the retrieval. 

3.1.2. The use of the ISCCP dataset. In the 
ISCCP data, cloud parameters and related quantities 
are retrieved from visible (VIS —0.6 /zm wavelength) 
and infrared (IR —11 /zm wavelength) radiances pro- 
vided by the set of polar and geostationary meteorolog- 
ical satellites [Rossow and Schiffer , 1999]. The ISCCP 
dataset is used in this study to discriminate between 
clear and cloudy scenes (selecting NN1 or NN2) and 
to give estimates of the cloud top temperature and sur- 
face skin temperatures. The pixel level dataset (the DX 
dataset) is selected for its spatial sampling of about 30 
km and its sampling interval of 3 hours [Rossou? et ai., 
1996]. 

3. 1.2.1. The surface temperature first-guess: 
ISCCP provides the surface skin temperature first-guess 
retrieved from IR radiances under clear conditions. The 
IR emissivity of the surface is always close to 1 and 
varies with the land surface type as in the GISS GCM. 
Instead of selecting the closest-in-time DX image to de- 
rive the surface temperature, a linear interpolation be- 
tween two ISCCP surface temperature estimates to the 
precise time of the SSM/I overpass is calculated to ac- 
count for the diurnal cycle. If the ISCCP DX scenes are 
cloudy, a clear sky compositing procedure is conducted 
within the ISCCP process to derive an estimation of 
the surface temperature (see Rossow and Garder [1993] 
for more details). The error associated with the sur- 
face temperature is estimated to be 4 K [Ro$sou/ and 
Garder , 1993]. 

3. 1.2. 2. Cloud a priori information: First the 

ISCCP data helps discriminate between clear and cloudy 
scenes. Over the ocean, it has been shown that the 
VIS and IR observations have a better ability than 
the microwave measurements to detect clouds [Lin and 
Rossow , 1994]. Given that the sensitivity of the mi- 
crowave to clouds over land is much lower than over 
ocean, when a pixel is considered clear by the ISCCP 
procedure, the LWP is fixed to zero. Two neural net- 
works are used, one .for clear scenes another for cloudy 
scenes. The ISCCP cloud flag directs the retrieval to 
one network or the other. 

For cloudy scenes, the cloud top temperature derived 
from IR measurements is added to the retrieval process 
as additional information, to account for the changes 
in the emission temperature of the cloud and in the 
cloud liquid water absorption coefficient. In contrast 
to the ocean case, clouds induce only small variations 
in the microwave radiation over land and additional 
cloud information facilitates their detection. Prigent 
and Rossow [1999] showed that the ability to estimate 



lengths, has been recently developed [Pardo et a/., 2000] 
but is not used in this study because the differences on 
SSM/I frequencies are negligible. 

Cloud absorption is calculated using the Rayleigh ap- 
proximation which is valid for most non-precipitating 
liquid water clouds at SSM/I frequencies. The cloud 
temperature is assumed to be equal to the air temper- 
ature at the same level. The dielectric; properties of 
liquid water are taken from Manabe et al [1987]. Scat- 
tering by large particles is not considered meaning that 
convective clouds and rain are not represented in the 
data base. 

The surface contribution is calculated using the monthly, 
mean emissivities previously calculated [Prigent et al . , 
1997, 1998] and assuming specular reflection at the sur- 
face. 

The consistency of the radiative transfer model has 
been checked. Observed brightness temperatures and 
simulated Tbs using the ISCCP Ts and LWP, the 
NCEP WV t and the monthly Em x s have been com- 
pared for two months of SSM/I data globally over snow 
and ice-free pixels: For all channels, the bias is smaller 
than 0.5 K even for cloudy cases. Thus, the training 
dataset generated with this radiative transfer model and 
sources of global data accurately represents the distri- 
bution of these parameters that SSM/I observes. 


3.3. Statistical Analysis of the Training Data 
Base 


The training data base generated by the RT model 
applied to the ISCCP, NCEP and monthly Em x s datasets 
contains the variables to be retrieved (Ts, WV\ LWP , 
and the seven Em l ), the seven simulated brightness 
temperatures Tb , and a priori information on the cloud 
top temperature T c and the temperature of the lowest 
layer of the atmosphere Ta. An error is associated with 
most variables that are used as first-guesses. The data 
base is produced from data collected for January and 
June 1993 over land between 60°S and 80°N. Snow or 
ice covered pixels are not considered: The snow and ice 
information comes from the NO A A operational analy- 
sis. 1,391,671 samples are collected, 55% of them cor- 
respond to cloudy scenes. 

Figure 3 shows the “global” distributions of some of 
the variables in the training data base. The distribu- 
tions are non-Gaussian and some of them are truncated. 
Non-Gaussian distributions are often indicators of non- 
linear behavior [ Burgers and Stephenson , 1999; Palmer. 
1999]. For example, the liquid water path distribution 
has its maximum at the lowest values and obviously can- 
not be negative. When retrieving such a variable with 
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perature with brightness temperature is higher for the 
vertical polarization than the horizontal one. This can 
be explained by less variability in the emissivities for the 
vertical polarization than for the horizontal one. Cor- 
relations between surface temperature and brightness 
temperatures at vertical polarization are similar at all 
SSM/I frequencies, which was not anticipated. At 22 
and 85 GHz, water vapor absorption was expected to 
impede a direct relation between surface contributions 
and top of the atmosphere measurements and as a mat- 
ter of fact, derivatives of the brightness temperatures 
with surface temperature (sensitivities) are smaller at 
22 and 85 GHz than at 19 and 37 GHz [ Prigent et al . , 
1999]. Other authors have also observed large correla- 
tions at 85 GHz between surface air temperatures and 
Tbs at 85 GHz. MacFarland et al [1990] investigated 
the correlation between SSM/I observations and sur- 
face air temp mature and concluded that 22 and 85 GHz 
measurements, depending on the surface type, are the 
most sensitive to the land surface temperature. Basist 
et al. [1998] ilso proposed a method to retrieve near- 
surface air temperature from SSM/I that relies heavily 
on the 85 GH:. channels. These results can be explained 
by two factors. First, for a given polarization, the sur- 
face emissivit es at 19 GHz are more variable than at 
other frequencies because of higher sensitivity to sur- 
face properties like soil moisture or vegetation water 
content and structure. These emissivity variations are 
not correlated with to surface temperatures fluctuations 
as indicated by the correlation coefficients between Ts 
and emissivity at 19 GHz (see Table 2). Secondly, the 
absorption at 22 and 85 GHz actually dumps the ef- 
fects of emissivity fluctuations, enhancing the relation- 
ship between brightness temperatures and surface tem- 
perature. The global correlation coefficients in Table 2 
may not be representative on a local scale. However, 
correlation coefficients for Ts have been calculated for 
three ranges of atmospheric water vapor amount and 
emissivities and no significant differences were observed 
in the coefficients.. 

Global correlations between atmospheric water va- 
por and brightness temperatures are relatively low es- 
pecially for vertical polarization because of large surface 
emissivities reducing the contrast between atmospheric 
and surface emissions. Even for horizontal polarization, 
global correlations never exceeds 0.6. However, these 
global values mask large local differences. Correlation 
coefficients calculated for different ranges of emissivities 
and water vapor amounts show that the results are very 
different, especially for the 85 and 22 GHz channels de- 
pending on water vapor amount. As a consequence, the 



scribes the variety of the situations to be analyzed. 


4. Results from the Neural Network 
Inversions 

Two neural networks have been trained, one for clear 
pixels (NN1) the other one for cloudy pixels (NN2), 
both using a priori information. The ISCCP cloud flag 
discriminates between clear and cloudy pixels. The ar- 
chitecture of the network NN1 is an MLP with 17 inputs 
coding the 7 SSM/1 observations y° and the first-guess 
x b ( Ts , Ta , IFI\ and 7 Em t ), 30 neurons in the hid- 
den layer, and 9 neurons in the output layer coding 
the retrieval x ( Ts , UT\ and 7 Em t ). The number of 
neurons in the hidden layer is estimated by a heuris- 
tic procedure that monitors the generalization errors of 
the neural network as the configuration is varied. The 
network NN2 has one additional input, the cloud top 
temperature Tc , and one addit.onal retrieval, the liq- 
uid water path (LWP). The in out variables and their 
associated standard deviation er *ors are summarized in 
Table 3. The full matrix of ti e error covariances is 
calculated at the end of the trai ling phase (not shown 
here). This matrix gives the statistical structure of the 
errors and is of great importance in the assimilation of 
retrieved products in a Numerical Weather Prediction 
scheme. 

For each variable, the distribution of the first-guess 
error is a Gaussian truncated at 2 standard deviations. 
In contrast to variational method, where only Gaussian 
distributions can be used, the neural network method 
can use any distributions shape. However in the present 
study, no in situ data are available to calculate the dis- 
tribution of the first-guess errors, so Gaussian noise is 
introduced independently on each variable to generate 
the first-guess. In the operational mode with real first- 
guesses, the technique will use the structure of the first- 
guess error correlations and the results should be even 
better. 

Figure 4 presents the learning curves of the neural 
network for clear and cloudy situations with and with- 
out first-guesses. To measure the impact of the intro- 
duction of the first-guess information in a neural net- 
work inversion scheme, two additional networks have 
been trained without first-guesses, one for clear con- 
ditions and another for cloudy scenes. The architec- 
tures of the the networks without first-guess are similar 
in structures, except that there are only seven inputs, 
coding the SSM/I observations y°. For each retrieved 
variable, the r.m.s error decreases from the first-guess 
r.m.s. error to a stable value after several iterations. 
The networks with first-guess input show substantially 
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exploit the spectral dependence of the first-guess erriis- 
sivities to provide a more accurate estimate of both the 
emissivities and the surface temperature. 

4.2. Water Vapor 

IVY is retrieved with a relative error of 309c for 
both clear and cloudy situations, when using a first- 
guess. This is a small improvement over.the first-guess 
r.m.s. error of 409c. The errors are not significantly 
different in presence of clouds. With the variational 
method [Prigent and Rossow . 1999], the retrieval er- 
rors were found to increase with decreasing emissivities 
and to increase in presence of clouds as expected from 
the sensitivity of the radiative transfer to the various 
parameters. As observed in Table 2, the correlation be- 
tween the brightness temperatures and WV is rather 
low (maximum of ~0.6 globally), and the neural net- 
work scheme is likely to exploit water vapor correlation 
with another variable to extract water vapor informa- 
tion when direct correlation between Tbs and WV is 
not sufficient. It is worth mentioning that the neural 
network is trained to minimize the absolute WV error 
difference not the relative error in WV. Changes could 
be made to minimize the relative error if this option 
were preferred. 

4.3. Cloud Liquid Water Path 

For LWP, the r.m.s error is 0.08 kg/m 2 globally. As 
expected, the error is larger in areas of high emissiv- 
ities where the contrast between the land surface and 
the cloud is smaller. Even in areas of low emissivities 
(0.85<emissivityl9H<0.9), the accuracy of the retrieval 
is not suitable for detection of majority if clouds. The 
cloud flag from ISCCP is of importance in this case to 
direct the retrieval toward the appropriate neural net- 
work. However, cloud structures with large liquid water 
path can be detected whatever the surface type; Plate 
1 shows several extended and thick clouds that are also 
present on the ISCCP images (not shown). Plate 1 does 
not show any evidence of LWP errors (discontinuities) 
related to strong emissivity gradients. 

4.4. Land Surface Emissivities 

When using a first-guess, the neural network tech- 
nique shows a good aptitude for retrieving land sur- 
face emissivities with an r.m.s. error lower than 0.008 
(0.010) globally for all channels, in clear conditions 
(cloudy conditions respectively). This is an improve- 
ment over the first-guess errors. Unaided by the first- 
guess estimate, the neural network technique does not 


4.6. Analysis of the Neural Network 
Sensitivities 

An interesting capability of the neural network tech- 
nique is that the adjoint model of the neural network is 
directly provided [Aires et ai , 1999]. The computation 
of this adjoint model (or neural Jacobians or neural sen- 
sitivities) is analytical and very fast. Since the neural 
network is non-linear, these Jacobians are dependent on 
the situation x . For example, the neural Jacobians in 
our example of equation (7) (an MLP network with one 
hidden layer) are: 



where a ' is the derivative of the transfer function a. 
For a more complex MLP network with more hidden 
layers, there exists a back-propagation algorithm that 
efficiently computes the neural Jacobians. The neural 
Jacobian concept is a very powerful tool since it allows 
for a statistical estimation of the multivariate and non- 
linear sensitivities between input and output variables 
in the model under study [Aires and Rossow , 2000]. 

Table 5 gives the mean neural Jacobian values for 
the variables Xk and y l for the neural network NN1 
with first-guess. The neural Jacobians are normalized 
by the standard deviation of the respective variables 
x to enable comparison of the sensitivities 

between variables with different variation characteris- 
tics. These values indicate the relative contribution of 
each input in the retrieval of a given output parameter. 
The numbers correspond to mean global values which 
may mask rather different behavior in various regions 
of the globe. 

Figure 6 presents some of the normalized neural Jaco- 
bians for the surface temperature and the water vapor 
for three ranges of Em at 19 GHz H polarization. De- 
pending on the surface emissivity, the sensitivity of Ts 
to different inputs changes from larger sensitivity to 19 
GHz vertical polarization for high emissivities to larger 
sensitivity to the 85 GHz observations and the first- 
guess information at low emissivities (Figure 6a). For 
\YY retrieval, very different regimes are observed for 
low and high water vapor amounts (Figure 6 b). from 
larger sensitivity to the 85 GHz channel horizontal po- 
larization for high water vapor amount to smaller sen- 
sitivity for low water vapor contents. The same trend 
is observed at 22 GHz. We have already commented on 
the differences between local and global correlations in 
Section 3.3. In contrast to a linear regresssion-tvpe al- 
gorithm that fits a mean state mapping between inputs 
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scheme that, include first-guess information. Its poten- 
tial have been tested in the complex and ill-conditioned 
problem of inversion of SSM/I microwave observations 
over land. A data base to train the neural network 
is derived from a global collection of coincident surface 
and atmospheric parameters, extracted from the NCEP 
reanalysis, from the ISCCP data, and from microwave 
emissivity atlases previously calculated. The introduc- 
tion of the first-guess information into the neural net- 
work has a considerable impact on the results compared 
to the network without first-guess. 

The r.m.s. error of the surface temperature retrieval 
is 1.3 K in clear sky conditions and 1.6 K in cloudy 
scenes over the globe. Microwave land surface temper- 
ature retrieval presents a very attractive complement to 
the infra-red estimates in cloudy areas. By combining 
both measurements as we have done, a complete (clear 
and cloudy days) time record of land surface temper- 
ature can be produced. Water vapor is retrieved with 
an r.m.s. error of 3.8 kg/m 2 in clear conditions and 
4.9 kg/m 2 in cloudy situations. The r.m.s. error in liq- 
uid water path is 0.08 kg/m 2 . The surface emissivities 
are retrieved with an accuracy of better than 0.008 in 
clear conditions and 0.010 in cloudy conditions, both 
improvements on the original first-guess. 

The analysis methodology presented here and com- 
pared to the better-knowm variational assimilation tech- 
nique provides an illustration of a more general ap- 
proach to the analysis of high- volume, multi- wavelength 
satellite observations that may have great potential. 
The common practice of isolating one variable at a time 
from such datasets breacks correlations among the mea- 
surements and among the retrieved quantities. The 
variational approach goes a step further by obtaining 
simultaneous retrievals of many quantities form mul- 
tiple measurements; however, as usually implemented, 
the variational analysis still does not account for corre- 
lations of variables. The neural network approach is not 
only able to accomodate strongly non-linear relation- 
ships, but also is able to benefit from the correlations 
to improve the retrievals. The neural network approach 
also requires much less computation than the varia- 
tional assimilation approach. That the two methods are 
conceptually close, as we have shown, puts the neural 
network approach on the same theoretical ground as the 
better-studied variational analysis methods. However, 
the fact that a simple neural network has been shown 
to provide a statistical fit to any function suggests that 
what the trained network is doing is simulating (statis- 
tically) the equations of the physical model, in this case 
an inverse radiative transfer model. Thus, despite use 


Appendix A: Notation 

x vector of physical variables to retrieve 
x estimate of x 

x b first-guess a priori information for x 
x n nth estimate of x in variational method 
e = x b - x, first-guess error 
y(x) radiative transfer function for the physical vari- 
able x (also a vector) 

y° SSM/I brightness temperature observations 
7 ] SSM/I instrumental noise 
P generic probability distribution function 
P n {r]) probability distribution function of rj 
P £ (e) probability distribution function of e 
H(x) derivative of y with respect to x 
A(x) covariance matrix of retrieval error estimates in 
variational method 

B =< e i • £ >. covariance matrix of the first-guess 
errors 

E =< rf • rj >, covariance matrix of the measure- 
ment errors 

F covariance matrix of the radiative transfer model 
errors 

£*[•] expectation operator 
a x activity of neuron i 
a sigmoide function of the neural network 
z t output of the neuron i 
u'ij synaptic weight between neuron i and neuron j 
gw neural network model 

\Y = {t L'ij}' the set of the parameters of the neural 
network 

y t neural network input value on neuron i 
Xk neural network output value on neuron k 
B dataset sampling the probability distribution 
functions 

D generic distance 
De Euclidean distance 

Ci (ir) theoretical quality criterion for classical neural 
network learning phase 

Ci (IP) practical quality criterion for classical neural net- 
work learning phase 

C 2 OV’) theoretical quality criterion for classical neu- 
ral network learning phase with first-guess 
information 

C 2 OO practical quality criterion for classical neural net- 
work learning phase with first-guess information 


Appendix B: The 1-D Variational 
Scheme 

This method is described by Rodgers [1976] and by 
Eyre [1939]. Tiie unified notation of Ide et al [1997] 


Using Bayes theorem, we can rewrite the conditional 
probability in (B3) as 


PuiiAx 6 ) 


P[x,y°,x b ) 

P{y°,x h ) 


P(y°,x*|x)P(x) 

P{y°,x b ) 


(Bo) 


It is often assumed, even if it is not always the case, that 
y° and x\ the direct and virtual (first-guess) measure- 
ments, are independent. In that case, wb can expand 
the corresponding joint probability distribution func- 
tions using the Bayes theorem 


P(y°\x)P(x b \x)P(z) 

P(y°)P(x b ) 


(B6) 


We want to maximize this probability with respect to 
x. If the probability distribution P(x) of the physical 
variables x is available, it is possible to use it in the 
general context of Bayesian estim ition. If this pdf is 
Gaussian, this would correspond to the addition of a 
term £[x - xj'-ET 1 [x - x] in (B8), where x is the mean 
state of the physical variables and B is the covariance 
matrix of the physical variables, his approach is not 
used in general in variational assimilation. 

If no a priori information on the distribution P(x) is 
available, this distribution is considered to be uniformly 
distributed (i.e., no information), so this term can be 
neglected during the maximization process. The two 
probabilities P{y°) and P(x b ) are not dependent on x 
so they can also be neglected. The maximum likelihood 
estimator is then obtained at the minimum of minus 
the log of the two remaining probabilities. Assuming 
that the minimum is unique, the optimal solution is 
characterized by 

dlogjPW lx)P(x 6 |x)] _ n (B?) 

dx 

These probabilities need to be rewritten in order to 
extract the independent random variables involved in 
the model. Note P{y°\x) = P{y°\y{x)) since the the- 
oretical radiative transfer function y is not a stochas- 
tic function. So P(y°|x) = Prt(V° ~~ v), where 
the probability distribution function of the instrumen- 
tal noise and the forward model error. Furthermore, 
P(y°\x) = P„[/f (x„)(x - Xn) + (y(*n) ~ V°)} re ~ 

lation (Bl). Also P(x 6 |x) = P 5 (x 6 - x) where P t is the 
probability distribution function of the first-guess error 
£. 

Assuming that the errors in the observations, the di- 
rect radiative transfer model, and the a priori first-guess 
information are unbiased, uncorrelated, and have Gaus- 
sian distributions, expression (B7) is equivalent to 


two pieces of information. If these matrices are not 
sufficiently precise, or if the variability of the matrices 
with atmospheric situations is not sufficiently sampled, 
an ‘‘empirical” weight has to be determined. 
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Table 1 . Comparison of the variational and neural network inversion schemes. 


Variational method 


Neural network inversion 


Observation / measurement 
First-guess a priori information 
Retrieved variable 
Direct model used 

Inverse model 
Model 

Quality criterion 
Dataset of observations 

Direct model errors 

First-guess error 

Observation error 


Inversion type 


RTM. used during the inversion 

linearized locally 

y(x) = y ( £ n ) “b H{x n )(x “ x n ) 

f f x P(x |y°,x 6 ) 

Used to estimate the first-guess 
error covariance matrix B 
Assumed to be Gaussian: 
with error covariance matrix F 
Assumed to be Gaussian: 
with error covariance matrix B 
Assumed to be Gaussian: 
with error covariance matrix E 
Local inversion: inversion 
process for each observation 


RTM. used during the construction of B. 
if no collocated dataset exists 
non-linear, global 
non-linear: x = g^(x b ,y°) 

U I I D ^9w(x\y°)^r P(x.y°'S) 

Used to sample the pdfs 

Already sampled in the dataset, 
if B is simulated by a RT model 
No constraint, simulated using true 
and first-guess solution datasets 
No constraint, depends on instrument, 
supposed to be Gaussian in this study, E 
Global inversion: estimation of the 
inverse model once and for all 


Table 3. R.M.S. error results for first-guess and retrievals. 



observation 
or first-guess 
errors 

NN1 

clear without 
first-guess 

NN1 

clear with 
first-guess 

NN2 

cloudy without 
first-guess 

NN2 
cloudy with 
first-guess 

TbSSMI 19 GHz V (K) 

0.60 





TbSSMl 19 GHz H (K) 

0.60 





TbSSMI 22 GHz V (K) 

0.60 

* ... 




TbSSMI 37 GHz V (K) 

0.60 





TbSSMI 37 GHz H (K) 

0.60 





TbSSMI 85 GHz V (K) 

0.60 





TbSSMI 85 GHz H (K) 

0.60 





Ta“(K) 

3.00 





Tc b (K) 

2.00 





Ts b (K) 

4.00 

3.47 

1.34 

3.31 

1.57 

LWP b (kg.m-2) 




0.09 

0.08 

\VY a (kg.m-2) 

40 % 

5.33 

3.83 

6.86 

4.90 

Em 19 GHz V 

0.016 

0.012 

0.004 

0.012 

0.006 

Em 19 GHz H 

0.018 

0.011 

0.004 

0.012 

0.006 

Em 22 GHz V 

0.018 

0.013 

0.005 

0.013 

0.006 

Em 37 GHz V 

0.015 

0.012 

0.004 

0.012 

0.006 

Em 37 GHz H 

0.018 

0.011 

0.005 

0.013 

0.006 

Em 85 GHz V 

0.020 

0.015 

0.006 

0.016 

0.009 

Em 85 GHz H 

0.023 

0.016 

0.008 

0.018 

0.010 


“XCEP 

b ISCCP. 


Table 5. Global mean neural sensitivities. 



Tsurf \ 

T ap-int 

Eml 

Em2 

Em 3 

Em4 

Em5 

Em6 

Em 7 

Tsurf 

.17 

-.13 

-.17 

-.11 

-.16 

-.19 

-.10 

-.12 

-.06 

Yap-int 

-.04 

.33 

.04 

.00 

.04 

.03 

-.02 

-.04 

-.08 

TB1 

.21 

.18 

.58 

.02 

.47 

.13 

-.21 

-.19 

-.17 

TB2 

.14 

.32 

-.04 

.88 

-.17 

-.38 

.09 

-.22 

-.30 

TB3 

.09 

-.78 

.05 

-.09 

.16 

-.24 

-.09 

-.57 

-.24 

TB4 

.21 

-.04 

.17 

-.30 ' 

.10 

.72 

.05 

.50 

-.03 

TB5 

.28 

-.95 

-.35 

.19 

-.26 

.04 

.79 

-.22 

.64 

TB6 

.25 

-.20 

-.38 

-.13 

-.30 

-.09 

-.28 

.89 

.04 

TB7 

-.21 

2.30 

.03 

-.22 

.08 

-.17 

-.03 

-.21 

.36 

Eml 

-.12 

.06 

.14 

.08 

.15 

.15 

.07 

.13 

.07 

Em2 

-.12 

-.02 

.13 

.11 

.14 

.15 

.10 

.15 

.10 

Em 3 

-.09 

.05 

.11 

.06 

.14 

.12 

.06 

.14 

.08 

Em4 

-.10 

.02 

.11 

.07 

.12 

.14 

.08 

.14 

.07 

Em5 

-.12 

-.05 

.12 

.10 

.14 

.16 

.11 

.16 

.12 

Em 6 

-.05 

-.05 

.06 

.05 

.08 

.08 

.05 

.17 

.11 

Em7 

-.05 

-.15 

.06 

.06 

.09 

.09 

.08 

.20 

.19 

Tlav 

-.03 

.07 

.00 

.00 

-.01 

-.01 

-.01 

-.06 

-.03 


Plate 1 . Retrieved fields of a) Ts in K. b) Em 19 GHz 
horizontal polarization, c) WV in kg/m 2 , and d) L\V P 
in kg/m 2 for June 11, 1993, from SSM/I observations 
with the F10 and Fll satellites. 

Plate 1 . Retrieved fields of a) Ts in K, b) Em 19 GHz horizontal polarization, c) WV in 
kg/m 2 , and d) LWP in kg/m 2 for June 11, 1993, from SSM/I observations with the F10 and 
Fll satellites. 
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